Awesome Linguistics Resources for Spanish
      
    
    Curated list of Linguistic Resources for doing Spanish NLP & CL.
    Clustering
    
    Speech
    
    
      Part of Speech Taggers (POS Taggers)
    
    
    
    
    Name Entity Recognition (NER)
    
    Corpora
    Shared tasks
    
    Corpora
    
      - 
        Multilingual Aligned Annotated Corpus (CRATER)
      
 
      - 
        UAM Treebank - 1,500 syntactically annotated sentences extracted from
          newspapers (El País Digital and Compra Maestra
      
 
      - 
        POSTagged/syntactic dependencies - European Corpus Initiative
          Multilingual Corpus I
      
 
      - 
        The Corpus of Contemporary Spanish(POStags, lemmas)
      
 
      - 
        Lemmas Dictionary
      
 
      - 
        esTenten Spanish (POSTagged)
      
 
      - 
        Europarl Corpus (Parallel Corpus English-Spanish)
      
 
      - 
        Colombian Political Speeches
      
 
      - 
        South American Slang Expressions/MTWE
      
 
      - 
        Syntax and Semantic Annotations (Subset Ancora Corpus)
      
 
      - 
        Plurilingual Specific Corpus on Economics, Medicine, Computer
          Science
      
 
      - 
        Copenhagen Treebank (Dependency Parsing)
      
 
      - 
        Reuters Corpora RCV2 - New Corpora
      
 
      - 
        MolinoLabs Corpus - News Corpora from Spain, Argentina and Mexico
      
 
      - 
        PANACEA- Legislation Corpus
      
 
      - 
        PANACEA- Legislation Ngram Corpus
      
 
      - 
        PANACEA- Dependency Parsed Corpus
      
 
      - 
        PANACEA- Monolingual Lexica (MWE, Frames, Semantic Classes)
      
 
      - 
        Opinion Mining - User reviews on Cars, Hotels, Washing machines,
          Books, Cell phones, Music..
      
 
      - 
        Cross Lingual Textual Entailment (CLTE) Corpus (English-Spanish)
      
 
      - 
        Ngram Frequencies out of Colombia News Corpora
      
 
      - 
        Sagan Textual Entailment Test Suite
      
 
      - 
        Garcia, Marcos and Pablo Gamallo, 2013 - Portuguese and Spanish
          biographical relation extraction corpora (Garcia, Marcos and Pablo
          Gamallo, 2013. Exploring the Effectiveness of Linguistic Knowledge for
          Biographical Relation Extraction. Natural Language Engineering,
          CJO2013. doi:10.1017/S1351324913000314.)
      
 
      - 
        Garcia, Marcos and Pablo Gamallo, 2014 - Portuguese, Spanish and
          Galician coreference corpora (Garcia, Marcos and Pablo Gamallo, 2014.
          Multilingual corpora with coreferential annotation of person entities.
          In Proceedings of the 9th edition of the Language Resources and
          Evaluation Conference (LREC 2014), Reykjavik: 3229-3233.)
      
 
      - 
        COW(Corpora From the Web) Ngram/Annotated People’s Name Corpora
      
 
      - 
        Wikicorpus- Portion of 2006’s wikipedia annotated with WordNet
          Synsets and POS
      
 
      - 
        Spanish Billion Words Corpus with word2vec Embeddings
      
 
    
    Misc
    
    Contribute
    
      Contributions welcome! Read the
      contribution guidelines first.
    
    License
    
      
    
    
      To the extent possible under law,
      David Przybilla has waived all
      copyright and related or neighboring rights to this work.